TTS and STT Vendors
Druid provides a flexible, speech provider agnostic architecture, allowing you to integrate with the industry's leading Text-to-Speech (TTS) and Speech-to-Text (STT) engines. This commitment to an open ecosystem ensures you can select the best-performing voice models for your specific language, region, or industry requirements.
As the conversational AI landscape evolves, we regularly extend this list to include emerging technologies and specialized providers.
Druid SIP
The following table outlines the supported voice service providers for Druid SIP integrations, highlighting availability across both cloud and on-premises deployments.
| Provider | Cloud TTS | Cloud STT | On premises TTS | On premises STT |
|---|---|---|---|---|
| Druid | yes | yes | no | no |
| Azure | yes | yes | yes | yes |
| ElevenLabs | yes | yes | yes | yes |
| Soniox | yes | yes | yes | yes |
| Deepgram | yes | yes | no | yes |
| Mistral AI (Voxtral speech model) | yes | yes | yes | yes |
WebChat Voice Channel
For web-based interactions, the WebChat Voice Channel supports a diverse range of providers to ensure low-latency and high-accuracy speech processing.
| Provider | Cloud TTS | Cloud STT | On premises TTS | On premises STT |
|---|---|---|---|---|
| Druid | yes | yes | no | no |
| Microsoft | yes | yes | yes | yes |
| ElevenLabs | yes | yes | yes | yes |
| Deepgram | no | yes | no | yes |
| Soniox | no | yes | no | yes |
| Speechmatics | no | yes | no | yes |